Predicting Maximum Data Staleness in Real-Time Warehouses
نویسندگان
چکیده
This paper presents an analysis technique for estimating maximum data staleness in a data warehouse that collects “near-real-time” data streams. Data is pushed to the warehouse from a variety of external sources with a wide range of inter-arrival times (e.g., once a minute to once a day). In prior work, ad hoc heuristic algorithms have been proposed for scheduling warehouse updates. In this paper, global multiprocessor real-time scheduling algorithms are considered as an alternative. It is shown that schedulability results concerning such algorithms can be used to analytically derive upper bounds on maximum data staleness based upon characteristics of warehouse tables and the parameters of update tasks. Simulation experiments are presented that show the effectiveness of the proposed approach.
منابع مشابه
Reduction of Materialized View Staleness Using Online Updates
Updating the materialized views stored in data warehouses usually implies making the warehouse unavailable to users. We propose MAUVE , a new algorithm for online incremental view updates that uses timestamps and allows consistent read-only access to the warehouse while it being updated. The algorithm propagates the updates to the views more often than the typical once a day in order to reduce ...
متن کاملHyDash: A Dashboard for Real-Time Business Intelligence based on the HyPer Main Memory Database System
Business Intelligence (BI) is a set of techniques that help improve business decision making. From a technical point of view, BI relies on a set of tools which includes performance dashboards: layered services that combine monitoring, analysis, and reporting. However, most dashboard solutions today are based on data warehouses which face a problem of data staleness—a circumstance caused by the ...
متن کاملProbabilistically Bounded Staleness for Practical Partial Quorums
Modern storage systems employing quorum replication are often configured to use partial, non-strict quorums. These systems wait only for a subset of their replicas to respond to a request before returning an answer, without guaranteeing that read and write replica sets intersect. While these partial quorum mechanisms provide only basic eventual consistency guarantees, with no limit to the recen...
متن کاملDefining and Measuring Data-Driven Quality Dimension of Staleness
With the growing complexity of data acquisition and processing methods, there is an increasing demand in understanding which data is outdated and how to have it as fresh as possible. Staleness is one of the key, time-related, data quality characteristics, that represents a degree of synchronization between data originators and information systems possessing the data. However, nowadays there is ...
متن کاملReal Time Pseudo-Range Correction Predicting by a Hybrid GASVM model in order to Improve RTDGPS Accuracy
Differential base station sometimes is not capable of sending correction information for minutes, due to radio interference or loss of signals. To overcome the degradation caused by the loss of Differential Global Positioning System (DGPS) Pseudo-Range Correction (PRC), predictions of PRC is possible. In this paper, the Support Vector Machine (SVM) and Genetic Algorithms (GAs) will be incorpor...
متن کامل